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Designing a belief function-based accessibility 
indicator to improve web browsing 
for disabled people 
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Abstract The purpose of this study is to provide an accessibility measure of web¬ 
pages, in order to draw disabled users to the pages that have been designed to be ac¬ 
cessible to them. Our approach is based on the theory of belief functions, using data 
which are supplied by reports produced by automatic web content assessors that test 
the validity of criteria defined by the WCAG 2.0 guidelines proposed by the World 
Wide Web Consortium (W3C) organization. These tools detect errors with gradual 
degrees of certainty and their results do not always converge. For these reasons, to 
fuse information coming from the reports, we choose to use an information fusion 
framework which can take into account the uncertainty and imprecision of infor¬ 
mation as well as divergences between sources. Our accessibility indicator covers 
four categories of deficiencies. To validate the theoretical approach in this context, 
we propose an evaluation completed on a corpus of 100 most visited French news 
websites, and 2 evaluation tools. The results obtained illustrate the interest of our 
accessibility indicator. 


1 Introduction 


The Web constitutes today an essential source of information and communication. 
While users have a growing interest in terms of social, cultural and economic value, 
and in spite of legislations and recommendations of the W3C community for making 
websites more accessible, its accessibility remains hardly efficient for some disabled 
or ageing users. Actually, making websites accessible and usable by disabled people 
is a challenge ii that society needs to overcome in. 
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To measure the accessibility of a webpage, several accessibility metrics have 
been developed mu. Evaluations are based on the failure to comply with the rec¬ 
ommendations of standards, using automatic evaluation tools. They often give a 
final value, continuous or discrete, to represent content accessibility. However, the 
fact remains that tests on accessibility criteria are far from being trivial ||2l. Eval¬ 
uation reports of automatic assessors contain errors considered as certain, but also 
warnings or potential problems which are uncertain. Moreover there are differences 
between assessor evaluations, even for errors considered as certain. 

This work provides a new measure of accessibility and an information fusion 
framework to fuse information coming from the reports of automatic assessors al¬ 
lowing search engines to re-rank their results according to an accessibility level, as 
some users would like cni . This accessibility indicator considers several categories 
of deficiencies. Our approach is based on the theory of the belief functions adapted 
to take into account the defects of accessibility given by several automatic assessors 
seen as information sources, the uncertainty of their results, as well as the possible 
conflicts between the sources. 

In the sections 2 and 3 we will give a description of accessibility tools based on 
a recent standard and of data provided in their reports. In the 4'* section, we will 
describe the principles of our indicator and develop how we implement the belief 
functions. In the 5'^' part, we will present an experiment before concluding. 


2 Defect detection of webpage accessibility 

Various accessibility standards propose recommendations for improving accessibil¬ 
ity of webpages. The Web Content Accessibility Guidelines (WCAG 2.0) El pro¬ 
posed by the W3C normalization organization, constitutes an international reference 
in the field. These guidelines cover a wide range of disabilities (visual, auditory, 
physical, speech, cognitive, etc.) and several layers of guidance are provided; 

• 4 overall principles: perception, operability, understandability & robustness; 

• testable success criteria: for each guideline, testable success criteria are provided. 
Every criterion is associated to one of the 3 defined conformance levels (A, AA 
and AAA), each representing a requirement of accessibility for users. 

Several automatic accessibility assessors, based on various accessibility standards, 
have been developed a for IT professionals. Their limits depend on the automatic 
tests. Because it is at present not possible to test some criteria about the quality 
of some pages, some assessor results are given with ambiguity. Consequently, the 
existing automatic assessors look for the criteria which are not met and give the 
defects according to 3 levels of validity; the number of errors, which are estimated 
certain, the number of likely problems (warnings) whose reality is not guaranteed 
and the number of potential problems (also called generic or non testable) which 
leads to a complete uncertainty on the tested criterion accessibility. 
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Finally, even though the results obtained by different assessors match for some 
tested common criteria, results can differ, even for errors considered as certain. 


3 Proposed accessibility indicator 

After a request, the indicator has to supply information describing to users the ac¬ 
cessibility level of each webpage proposed by a search engine. Presented simulta¬ 
neously with these pages, the indicators’ information cover two aspects: 

• the accessibility for categories of deficiencies: as previously proposed for ac¬ 
cessibility estimation B we use 4 major categories: visual, hearing, motor and 
cognitive dehciencies, as dehned by ca. They are called “dehciency frames”; 

• the level of accessibility for each deficiency frame. 

Collecting results from several assessors has allowed us to beneht from each of their 
performance. In addition, it strengthens accessibility evaluation for similar results 
and manages conflicts in case of disagreements. Automatic assessors check a set 
of criteria which correspond to many deficiencies. As our accessibility evaluation 
varies for every dehciency frame, our method consists in selecting the relevant cri¬ 
teria for each dehciency frame and then balancing each criterion to consider the 
difficulties met by users in case of failure. This weighting is based on the criterion 
conformance level (A, AA, AAA), which corresponds to decreasing priorities (A: 
most important, etc.). The errors and problems detected for every criterion of the 
accessibility standard affect the accessibility indicator of the Web content tested ac¬ 
cording to the dehciency frame the criterion belongs to, its weighting within the 
frame, the number of occurrences when it is analyzed as a defect in the webpage 
and the defect’s degree of certainty (error, likely or potential problem). 


4 Defect detection and accessibility evaluation 

After collecting webpage Uniform Resource Locators (URLp) selected by a search 
engine from a request, these addresses are supplied to the accessibility assessors and 
successively for each page, we detect accessibility defects, then estimate accessibil¬ 
ity level by dehciency frame for each assessor, before fusing the data by dehciency 
frame and taking the decision for every dehciency frame Q. 
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4.1 Assessor evaluations of selected pages 

Each URLp is submitted to the accessibility evaluation tests by each assessor i that 
tests all the criteria k of the WCAG 2.0 standard, and the following data are collected 
by a filter that extracts the required data for each deficiency frame; 

• : errors observed for a criterion k by an assessor /; 

• : correct checkpoints for a criterion k by an assessor i; 

• : tests that can induce errors for a criterion k by an assessor /; 

• NI^\ likely problems detected for a criterion k by an assessor i; 

• Tf : tests that can induce likely problems for a criterion k by an assessor /; 

• : potential problems suspected for a criterion k by an assessor i; 

• T ^.: tests that can induce potential problems for a criterion k by an assessor i; 

• 7] : total tests by an assessor i, with: 

Ti = UKi+Nii+Ki+^li^ ( 1 ) 

k 


4.2 Accessibility indicator level of the pages 

To model initial information including uncertainties, the reliability of the assessors 
seen as information sources and their possible conflicts, we use the theory of belief 
functions El O- Our objective is to define if a webpage is accessible (Ac) or not 
accessible (Ac) and to supply an indication by deficiency frame. Consequently, these 
questions can be handled independently for every deficiency frame O/, = {Ac,Ac}. 
We can consider every power set 2'^* = {0,Ac,Ac,f2}. 

The estimation of the accessibility Ac for a deficiency frame h and a source i 
(assessor) is estimated from the number of correct tests for each of the criteria k 
occurring in this frame, and from their conformance level represented by a^'. 


E{Ac)hj = 


l-kiNli * ak) 

Ti 


( 2 ) 


The estimation of the non accessibility Ac for a deficiency frame h and a source 
i is estimated from the number of errors for each of the criteria k occurring in this 
frame, and from the coefficient. A weakening jSf coefficient is also introduced to 
model the degree of certainty of the error; 


E{Ac)h,i = 


LkiNf^ak^Pn 


k,i 


(3) 


The estimation of the ignorance f2/, for a deficiency frame h and a source i is es¬ 
timated from the number of likely and potential problem for each of the criteria k 
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occurring in this frame, and from the Uk coefficient. The weakening coefficients j3/ 
or Pj’ are also used to model the degree of certainty of the problem: 


E {Qh.i 


Lk{Ni,i *(^k* PH 

^k{Tl, + T(P 


(4) 


The mass functions of the subsets of 2^* are computed from the estimations: 


m{Ac)hj 

E{Ac)hj 

(5) 

E {Ac)hj + E{Ac)h4 +E{Q)iij 

m{Ac)hj 

E{Ac)hj 

(6) 

E{Ac)h4 + E[Ac)h4 +£’(f2)/, ,■ 

m{Q)ipi 

E{n)h,i 

(7) 

E (Ac)hj + E {Ac)hj + E{Q )hj 


In addition, the source reliability can be modeled CD with a 5, coefficient, which 
constitutes a benefit when some assessors are more efficient than others: 

{Ac)hj = 5i * m{Ac)h,i 

{Ac)hj = Si *m{Ac)h,i (g) 


4.3 Merging assessor results and decision-making 

Once the masses for each assessor have been obtained, a fusion of the results is 
conducted by dehciency frame, using the conjunctive rule d, to combine them 
and give information in the form of a mass function. These rule properties, which 
strengthen common results and manage conflicts between sources, are particularly 
relevant in this context, to deal with divergences between assessor evaluations. To 
calculate the hnal decision Dh{URLp) for a page by dehciency frame, we use the 
pignistic probability id. 

There are several ways of presenting the accessibility indicator to users. To visu¬ 
alize the dehciency frames, existing specihc pictograms are effective. To present the 
accessibility level we discretize the decision into 5 levels (very good, good, moder¬ 
ate, bad or very bad accessibility) using thresholds and visualized it by an ’’arrow”: 

• if D/i < 5i, the Web content accessibility is very bad (j,), 

• if S\ < Dh < S 2 , the Web content accessibility is bad (\), 

• if S 2 < Df, < Sj, the Web content accessibility is moderate (—>^), 

• if Sj < Df, < S 4 , the Web content accessibility is good {/'), 

• if 54 < D/,, the Web content accessibility is very good (f). 
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5 Experiments 

To validate our approach, we present here the results obtained on a set of 100 news 
Websites, among the most visited ones, all referenced by the OJD organization 
which provides certification and publication of attendance figures for websites. We 
test their homepages, following a study 02 concluding that their usability is pre¬ 
dictive of the whole site. We chose two open source assessors AChecker, (source 1) 
a, and TAW (source 2) from which we extract automatically the accessibility test 
results. Weight and threshold values given in Table were previously empirically 
defined from Webpages [^assumed to be accessible. 


Weightings 

; a2 ;a3 

; A' 

5i ; 52 ;a3 

A, AA, AAA conformance levels 
Certainty levels of errors or problems 
AChecker and TAW reliabilities (sources) 

1 ; 0.8 ; 0.6 

1 ;0.5 ; 1 

1; 1 

Thresholds 

SI ; S2 ; S3 ; S4 

Accessibility indicator levels 

0.6 ; 0.7 ; 0.8 ; 0.9 


Table 1: Constant values for our accessibility metric. 

The results of these sources are summarized in Figure for the 3 levels of cer¬ 
tainty defects. The box plots present how their defects are distributed; minimum 
and maximum (whiskers), C' (bottom box plot) and 3'^^ quartiles (top box plot) and 
average (horizontal line). We observe similarities between the assessors’ results for 
the errors detected as certain, but also huge differences for the likely (warnings) and 
potential (non testable) problems. The number of likely problems is almost null for 
AChecker and the potential one remains always the same for TAW. 


Errors 
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700 - 

600 - 

500 - 

400 - 

300 - -J —— 

200 — ^ 

100-LJ- — 

0 J^- V 

TAW AChecker 


I 


Likely problems 

1400 T- 

1200 - 

1000 - 

800 - 

600 - — _^ - 

400 -1 - 

200 -■- 

0 -^-^- 1 

TAW AChecker 


Potential problems 
2500 n- 

2000 - 

1500 -— 

1000 -- — 

500 - 

0 ]—^^^- 1 

TAW AChecker 


Fig. 1; Results of automatic assessors. 

The detected defects are taken into account in our accessibility indicator results 
presented in Figure The mass function values of accessibility m(Ac) for the 2 
sources, TAW and AChecker, and the fusion result are visualized for 3 deficiency 


* OJD: http://www.ojd.com/Chiffres/Le-Numerique/Sites-Web/Sites-Web-GP 
^ Sites labeled by Accessiweb: http://www.accessiweb.org/index.php/galerie.html 
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frames among the 4, and globally for all deficiencies. Firstly, we can see that m(Ac) 
is not evenly distributed between the 2 sources; their distributions of errors (Fig¬ 
ure 1^ are comparable even if there is a larger range for AChecker; however the 
mass function of accessibility is smaller for AChecker compared to TAW. This is 
due to the more numerous potential problems (non testable criteria) detected by the 
AChecker assessor, increasing substantially the denominator in the computation of 
m(Ac) (Eq. 5). By the way, the values of £’(f2) and consequently of m{Q), are more 
important, as the fij’ weight for potential problems is 2 times higher than j3/ for the 
likely problems (warnings). We can also notice that the fusion result obtained by the 
conjunctive rule strengthens the mass functions of the 2 assessors. 
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Fig. 2; Accessibility indicator results. 

In this corpus, visual and cognitive dehciencies have a higher impact on con¬ 
tent accessibility than the motor ones. This is logical for news websites, as their 
homepages include a large number of images. By the way, the motor indicator is 
less impacted, in particular by the lack of alternatives for images, useful for visual 
and cognitive deficiencies. Finally, we observe a similarity between the visual and 
global indicators, as around 80% of all the checkpoints concern visual deficiencies 
and also because these controls are properly taken into account by assessors. 


Web content (URLp) 

Visual 

Dec 

Motor 

sion 

Cognitive 

Global 

LeParisien.fr 

Famili.fr 

Arte.tv 

LePoint.fr 

0.972 t 

0.769 -> 
0.701 

0.630 \ 

0.989 t 

0.9241 
0.718^ 
0.725 -> 

0.9741 

0.838 ^ 
0.717-> 
0.673 \ 

0.9711 

0.766 ^ 
0.686 \ 
0.627 \ 


Table 2: Examples of detailed accessibility results by dehciency frame. 

In Table are presented detailed results for several sites with signihcant indi¬ 
cator result differences. Eor examples, LePoint.fr and Arte.tv, respectively 19'^’ and 
33'^ most consulted websites in Erance, obtain only 0.627 and 0.686 for the global 
result, whereas LeParisien.fr, ranked 12'*, reaches 0.971. Eor Family.fr we observe 
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differences between the deficiencies, nevertheless focus on accessibility generally 
benefits all deficiencies on the whole corpus. 


6 Conclusion 

We present an indicator estimating webpage accessibility levels for distinct cate¬ 
gories of deficiencies, in order to supply easily understandable accessibility infor¬ 
mation to users on pages proposed by a search engine. Our method based on belief 
function theory fuses results from several automatic assessors and considers their 
uncertainties. An accurate modelization of the assessor characteristics and of the 
impact of defect guideline criteria on accessibility is proposed. An experiment per¬ 
formed on a set of 100 news websites validates the method, which benefits from 
each of the assessor performances on specific criterion tests. Our future research 
will focus on the implementation of a user’s personal weighting to balance the im¬ 
portance of criteria. 
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